Memory Scrubbing
   HOME

TheInfoList



OR:

Memory scrubbing consists of reading from each
computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...
location, correcting
bit error In digital transmission, the number of bit errors is the number of received bits of a data stream over a communication channel that have been altered due to noise, interference, distortion or bit synchronization errors. The bit error rate (BER ...
s (if any) with an error-correcting code ( ECC), and writing the corrected data back to the same location. Due to the high integration density of modern computer memory
chips ''CHiPs'' is an American crime drama television series created by Rick Rosner and originally aired on NBC from September 15, 1977, to May 1, 1983. It follows the lives of two motorcycle officers of the California Highway Patrol (CHP). The serie ...
, the individual memory cell structures became small enough to be vulnerable to
cosmic ray Cosmic rays are high-energy particles or clusters of particles (primarily represented by protons or atomic nuclei) that move through space at nearly the speed of light. They originate from the Sun, from outside of the Solar System in our own ...
s and/or
alpha particle Alpha particles, also called alpha rays or alpha radiation, consist of two protons and two neutrons bound together into a particle identical to a helium-4 nucleus. They are generally produced in the process of alpha decay, but may also be produce ...
emission. The errors caused by these phenomena are called
soft error In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a s ...
s. Over 8% of DIMM modules experience at least one correctable error per year. This can be a problem for
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
and SRAM based memories. The probability of a soft error at any individual memory bit is very small. However, together with the large amount of memory modern computersespecially serversare equipped with, and together with extended periods of
uptime Uptime is a measure of system reliability, expressed as the percentage of time a machine, typically a computer, has been working and available. Uptime is the opposite of downtime. It is often used as a measure of computer operating system reliabi ...
, the probability of soft errors in the total memory installed is significant. The information in an
ECC memory Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption c ...
is stored redundantly enough to correct single bit error per memory word. Hence, an ECC memory can support the scrubbing of the memory content. Namely, if the
memory controller The memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an int ...
scans systematically through the memory, the single bit errors can be detected, the erroneous bit can be determined using the ECC
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
, and the corrected data can be written back to the memory.


Overview

It is important to check each memory location periodically, frequently enough, before ''multiple'' bit errors within the same word are too likely to occur, because the ''one'' bit errors can be corrected, but the ''multiple'' bit errors are not correctable, in the case of usual (as of 2008) ECC memory modules. In order to not disturb regular memory requests from the CPU and thus prevent decreasing
performance A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Management science In the work place ...
, scrubbing is usually only done during idle periods. As the scrubbing consists of normal read and write operations, it may increase
power consumption Electric energy consumption is the form of energy consumption that uses electrical energy. Electric energy consumption is the actual energy demand made on existing electricity supply for transportation, residential, industrial, commercial, and ot ...
for the memory compared to non-scrubbing operation. Therefore, scrubbing is not performed continuously but periodically. For many servers, the scrub period can be configured in the
BIOS In computing, BIOS (, ; Basic Input/Output System, also known as the System BIOS, ROM BIOS, BIOS ROM or PC BIOS) is firmware used to provide runtime services for operating systems and programs and to perform hardware initialization during the ...
setup program. The normal memory reads issued by the CPU or DMA devices are checked for ECC errors, but due to
data locality In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
reasons they can be confined to a small range of addresses and keeping other memory locations untouched for a very long time. These locations can become vulnerable to more than one soft error, while scrubbing ensures the checking of the whole memory within a guaranteed time. On some systems, not only the main memory (DRAM-based) is capable of scrubbing but also the
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
s (SRAM-based). On most systems the scrubbing rates for both can be set independently. Because cache is much smaller than the main memory, the scrubbing for caches does not need to happen as frequently. Memory scrubbing increases reliability, therefore it can be classified as a
RAS Ras or RAS may refer to: Arts and media * RAS Records Real Authentic Sound, a reggae record label * Rundfunk Anstalt Südtirol, a south Tyrolese public broadcasting service * Rás 1, an Icelandic radio station * Rás 2, an Icelandic radio stati ...
feature.


Variants

There are usually two variants, known as ''patrol scrubbing'' and ''demand scrubbing''. While they both essentially perform memory scrubbing and associated error correction (if it is doable), the main difference is how these two variants are initiated and executed. Patrol scrubbing runs in an automated manner when the system is idle, while demand scrubbing performs the error correction when the data is actually requested from main memory.


See also

*
Data scrubbing Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data ...
, a general category containing memory scrubbing *
Soft error In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a s ...
, an important reason for doing memory scrubbing * Error detection and correction, a general theory used for memory scrubbing *
Memory refresh Memory refresh is the process of periodically reading information from an area of computer memory and immediately rewriting the read information to the same area without modification, for the purpose of preserving the information."refresh cycle" i ...
, which preserves information stored in memory


References

{{DEFAULTSORT:Memory Scrubbing Computer memory